Introduction

In this Rmarkdown document we are going to integrate and cluster the merged Seurat objects. We aim to 1st annotate the main regions in the tissue, so that later we can subset to them and gain more resolution. To carry out the integration we are going to use Harmony since it is a recommeded integration method by the paper A benchmark of batch-effect correction methods for single-cell RNA sequencing data.

The vignette can be found here

Libraries

library(dplyr)
library(ggplot2)
library(Seurat)
library(harmony)

Parameters

Here we will define and load objects and functions that will be used throughout the document.

set.seed(123)
source(here::here("misc/paths.R"))
source(here::here("utils/bin.R"))

"{clust}/{plt_dir}" %>%
  glue::glue() %>%
  here::here() %>%
  dir.create(path = .,
             showWarnings = FALSE,
             recursive = TRUE)

"{clust}/{robj_dir}" %>%
  glue::glue() %>%
  here::here() %>%
  dir.create(path = .,
             showWarnings = FALSE,
             recursive = TRUE)

Load data

The data used in this Rmarkdown document comes from 02-QC_common.Rmd where the data was QC’d.

merged_se <- purrr::map(id_sp_df$gem_id, function(x) {
  tmp_se <- "{qc}/{robj_dir}/qc_se_{x}.rds" %>%
    glue::glue() %>%
    here::here() %>% 
    readRDS(file = .)
  
  tmp_se[["gem_id"]] <- x
  tmp_se[["donor_id"]] <- id_sp_df[id_sp_df$gem_id == x, ] %>% dplyr::pull(donor_id)
  tmp_se
}) %>% 
  purrr::reduce(., merge)

Analysis

Preprocessing

Preprocess data

merged_se <- Seurat::NormalizeData(merged_se,
                                   verbose = FALSE) %>%
  Seurat::FindVariableFeatures(., verbose = FALSE) %>%
  Seurat::ScaleData(., verbose = FALSE) %>%
  Seurat::RunPCA(., verbose = FALSE)

Lets see if the PCs merge well using the uncorrected PCs, we can see how there are clear differences due to the batch (donor).

p1 <- Seurat::DimPlot(object = merged_se,
                       reduction = "pca",
                      pt.size = .1,
                      group.by = "donor_id")

p2 <- Seurat::VlnPlot(object = merged_se,
              features = "PC_1",
              group.by = "donor_id",
              pt.size = .1)
p1 + p2

Lets also check how they look in a UMAP embedding prior to integration

merged_se <- Seurat::RunUMAP(
  merged_se,
  reduction = "pca",
  dims = 1:30,
  verbose = FALSE)

merg_pca <- Seurat::DimPlot(
  merged_se,
  group.by = "donor_id",
  pt.size = 0.1,
  reduction = "pca")

merg_umap <- Seurat::DimPlot(
  merged_se,
  group.by = "donor_id",
  pt.size = 0.1,
  reduction = "umap")

merg_pca | merg_umap

We can appreciate how they are clearly separate and require integration.

Harmony Integration

Seurat::ElbowPlot(merged_se)

# After looking at the Elbow plot we see the elbow at around 5 PCs so we are going to be a bit lax and use the top 15
merged_se <- harmony::RunHarmony(merged_se,
                                 group.by.vars = "donor_id",
                                 assay.use = "Spatial",
                                 reduction = "pca",
                                 dims = 1:20,
                                 plot_convergence = TRUE)

Clustering + non-linear dimensionality embedding

merged_se <- Seurat::FindNeighbors(merged_se,
                                   reduction = "harmony",
                                   dims = 1:20,
                                   verbose = FALSE) %>% 
  Seurat::FindClusters(
    resolution = c(0.01, 0.05, 0.1, 0.3,
                   0.8, 1, 1.25, 1.5))
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 16224
## Number of edges: 599760
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9900
## Number of communities: 1
## Elapsed time: 2 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 16224
## Number of edges: 599760
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9671
## Number of communities: 4
## Elapsed time: 3 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 16224
## Number of edges: 599760
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9458
## Number of communities: 5
## Elapsed time: 3 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 16224
## Number of edges: 599760
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8995
## Number of communities: 9
## Elapsed time: 3 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 16224
## Number of edges: 599760
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8452
## Number of communities: 17
## Elapsed time: 3 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 16224
## Number of edges: 599760
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8283
## Number of communities: 18
## Elapsed time: 3 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 16224
## Number of edges: 599760
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8094
## Number of communities: 21
## Elapsed time: 3 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 16224
## Number of edges: 599760
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7939
## Number of communities: 24
## Elapsed time: 3 seconds
merged_se <- Seurat::RunUMAP(merged_se,
                             reduction = "harmony",
                             dims = 1:20,
                             verbose = FALSE)

1st we want to see if the datasets are well merged after scaling and normalizing the data. All the data comes from the same flowcell and the same Visium slide.

Seurat::DimPlot(object = merged_se,
                group.by = c("donor_id"))

donor_metadata <- readr::read_csv(
  file = here::here("../data/tonsil_atlas_donor_metadata.csv"))

id_comb <- readr::read_tsv(
  file = here::here("01-spaceranger/data/donor_ids.txt")) %>%
  dplyr::rename(
    sample_id = BCLL_ID,
    donor_id = Old_names
    )

se_metadata <- merged_se@meta.data %>%
  dplyr::mutate(sample_id = stringr::str_replace(string = sample_id,
                                                 pattern = "_",
                                                 replacement = "-")) %>%
  tibble::rownames_to_column("barcode")

donor_metadata2 <- donor_metadata %>%
  dplyr::left_join(id_comb, by = "donor_id") %>%
  dplyr::select(-donor_id)


merged_se@meta.data <- se_metadata %>%
  dplyr::left_join(donor_metadata2,
                    by = "sample_id") %>%
  tibble::column_to_rownames("barcode")

unique(merged_se@meta.data$sampltonsile_id)
## NULL
table(merged_se@meta.data$age, merged_se@meta.data$sample_id, useNA = "ifany")
##                  
##                   BCLL-10-T BCLL-11-T BCLL-12-T BCLL-13-T BCLL-14-T BCLL-2-T BCLL-8-T BCLL-9-T
##   3y11m,BCLL-10-T      3079         0         0         0         0        0        0        0
##   3y1m,BCLL-12-T          0         0      3089         0         0        0        0        0
##   4y11m,BCLL-8-T          0         0         0         0         0        0     1422        0
##   5y11m,BCLL-13-T         0         0         0       913         0        0        0        0
##   5y6m,BCLL-11-T          0      2012         0         0         0        0        0        0
##   5y9m,BCLL-9-T           0         0         0         0         0        0        0     2170
##   65y,BCLL-2-T            0         0         0         0         0     2194        0        0
##   <NA>                    0         0         0         0      1345        0        0        0

Visualize the clustering / Split by donors so we can see if there are clusters exclusive to one sample/group

Seurat::DimPlot(merged_se,
               group.by = c("Spatial_snn_res.0.01",
                            "Spatial_snn_res.0.05",
                            "Spatial_snn_res.0.1",
                            "Spatial_snn_res.0.3",
                            "Spatial_snn_res.0.8",
                            "Spatial_snn_res.1",
                            "Spatial_snn_res.1.25",
                            "Spatial_snn_res.1.5"),
               pt.size = 0.1,
               reduction = "umap",
               label = TRUE,
               # split.by = "age",
               ncol = 4)

Save object

"{clust}/{robj_dir}/integrated_spatial.rds" %>%
  glue::glue() %>%
  here::here() %>%
  saveRDS(object = merged_se, file = .)

Save shiny

Function

#' This function takes in a Seurat 3.0 object and returns a named list with 2
#' objects formated to be loaded in the ShinyApp:
#'
#'      1. Metadata + coordinates of the 2D embedding.
#'      2. Expression matrix of the selected slot.
#'
#' ShinyApp: https://github.com/Single-Cell-Genomics-Group-CNAG-CRG/shiny-annotation/blob/main/seurat2shiny.R
#'
#' @param object Object of class Seurat. Mandatory.
#' @param tech Character string. Specify the technology 
#' @param assay Character string. Assay within the Seurat object from which to extract the expression matrix. Default: active assay.
#' @param slot Character string. Slot containing the expression matrix. Default: data.
#' @param reduction Character string. Dimensionality reduction from which to extract the 2D coordinates. Default: umap.
#' @param image Character string or NULL. When tech is sp, name of the image from which we want to extract the coordinates as found in names(object/@images), by default NULL.
#' @param asfactors Character vector. Metadata columns to be converted to factors. Default: NULL.
#' @param save Logical value. Save metadata and expression matrix objects as RDS files. Default: FALSE
#' @param path Character string. Path to save output files if 'save = TRUE'. Default: working directory.
#'
#' @return Named list containing the joint metadata + 2D embedding and the expression matrix.
#'
#' @examples
#' seurat2shiny( object = seurat_object, asfactors = c("plate", "replicate") )
#'
#' shiny_list = seurat2shiny(object = seurat_object)
#'
#' @export

seurat2shiny = function(
    object                         ,
    tech      = c("sc", "sp")      ,
    assay     = object@active.assay,
    slot      = "data"             ,
    reduction = "umap"             ,
    image     = NULL               ,
    asfactors = NULL               ,
    save      = FALSE               ,
    path      = "."                  # path = getwd()
) {
    suppressMessages( library(Seurat) );

    # Input check.
    if ( ! is(object, "Seurat") )
        stop("'object' is not a Seurat object.");

    if ( ! assay %in% Seurat::Assays(object) )
        stop("'assay' not in the Seurat object's available assays.");

    if ( tech == "sc" & ! (reduction %in% names(object@reductions)) )
        stop("'reduction' not in the Seurat object's available reductions.");

    if ( ! slot %in% c("counts", "data", "scale.data") )
        stop("'slot' not in the Seurat object's available slots.");
    
    if ( ! tech %in% c("sc", "sp") )
        stop("tech must be sc or sp.");
    
    
    # Check Which technology it is processing
    if (tech == "sc") {
        # Extract 2D coordinates. Keep only first 2 dimensions, remove the rest if any.
        embeds <- as.data.frame(object@reductions[[reduction]]@cell.embeddings)[1:2];
        names(embeds) <- c("coord_x", "coord_y");
    } else if (tech == "sp") {
        # If the image is null select the first one
        if (is.null(image)) {
            image <- names(object@images)[1]
            warning(sprintf("image is not set, we will use %s", image))
        } 
        
        embeds <- data.frame(object@images[[image]]@coordinates[, c("imagerow", "imagecol")])
        colnames(embeds) <- c("coord_y", "coord_x");
        
        # Inverse coord_y
        embeds$coord_y <- - embeds$coord_y
    }
    

    # Join metadata with coordinates.
    metadata <- object@meta.data;

    for (col in asfactors) {
        metadata[[col]] <- as.factor(metadata[[col]]);
    };

    metadata <- merge(x = metadata, y = embeds, by = "row.names");
    names(metadata)[1] <-  "barcode"; # names(metadata)[names(metadata) == "Row.names"] = "barcode";
    rownames(metadata) <- metadata$barcode
    metadata$barcode <- as.character(metadata$barcode)
    # Reset barcode order which is shuffled in merge
    metadata <- metadata[rownames(object@meta.data), ]
    
    # Extract expression data.
    # expression = as.matrix( Seurat::GetAssayData(object = object, slot = slot, assay = assay) );
    expression = Seurat::GetAssayData(object = object, slot = slot, assay = assay);

    if ( ! identical( as.character(metadata$barcode), colnames(expression) ) )
        warning("Cells in metadata and expression matrix do not match.");

    if (save) {
        saveRDS( object = metadata  , file = paste0(path, "/metadata.rds"  ) );
        saveRDS( object = expression, file = paste0(path, "/expression.rds") );
    };

    invisible(
        list(metadata = metadata, expression = expression)
    );
}

Loop through all the images

lapply(Seurat::Images(merged_se), function(i) {
  
  se_sub <- subset(merged_se, subset = gem_id == i)
  se_sub@images <- se_sub@images[Seurat::Images(se_sub) == i]

  shiny_ls <- seurat2shiny(
    object = se_sub,
    tech = "sp",
    assay = "Spatial",
    slot = "data",
    image = i)
  
  # Save metadata
  "{clust}/{robj_dir}/shiny_metadata_{i}.rds" %>%
    glue::glue() %>%
    here::here() %>%
    saveRDS(object = shiny_ls[[1]], file = .)
  
  # Save expression matrix
  "{clust}/{robj_dir}/shiny_expression_{i}.rds" %>%
    glue::glue() %>%
    here::here() %>%
    saveRDS(object = shiny_ls[[2]], file = .)
    
})
## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
## NULL
## 
## [[5]]
## NULL
## 
## [[6]]
## NULL
## 
## [[7]]
## NULL
## 
## [[8]]
## NULL

Session Info

sessionInfo()
## R version 4.0.1 (2020-06-06)
## Platform: x86_64-conda_cos6-linux-gnu (64-bit)
## Running under: Red Hat Enterprise Linux Server release 6.7 (Santiago)
## 
## Matrix products: default
## BLAS/LAPACK: /scratch/groups/hheyn/software/anaconda3/envs/spatial_r/lib/libopenblasp-r0.3.12.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=es_ES.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=es_ES.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] harmony_1.0        Rcpp_1.0.7         SeuratObject_4.0.4 Seurat_4.0.5       ggplot2_3.3.5      dplyr_1.0.7       
## 
## loaded via a namespace (and not attached):
##   [1] Rtsne_0.15            colorspace_2.0-2      deldir_1.0-6          ellipsis_0.3.2        ggridges_0.5.3        rprojroot_2.0.2       spatstat.data_2.1-0   farver_2.1.0          leiden_0.3.9          listenv_0.8.0         bit64_4.0.5           ggrepel_0.9.1         RSpectra_0.16-0       fansi_0.5.0           codetools_0.2-18      splines_4.0.1         knitr_1.36            polyclip_1.10-0       jsonlite_1.7.2        ica_1.0-2             cluster_2.1.2         png_0.1-7             uwot_0.1.11           shiny_1.7.1           sctransform_0.3.2     spatstat.sparse_2.0-0 readr_2.1.1           compiler_4.0.1        httr_1.4.2            assertthat_0.2.1      Matrix_1.3-4          fastmap_1.1.0         lazyeval_0.2.2        cli_3.1.0             later_1.3.0           htmltools_0.5.2       tools_4.0.1           igraph_1.2.9          gtable_0.3.0          glue_1.5.1            RANN_2.6.1            reshape2_1.4.4        scattermore_0.7       jquerylib_0.1.4       vctrs_0.3.8           nlme_3.1-153          lmtest_0.9-39         xfun_0.28             stringr_1.4.0         globals_0.14.0        mime_0.12             miniUI_0.1.1.1        lifecycle_1.0.1       irlba_2.3.4          
##  [55] goftest_1.2-3         future_1.23.0         MASS_7.3-54           zoo_1.8-9             scales_1.1.1          vroom_1.5.7           spatstat.core_2.3-2   hms_1.1.1             promises_1.2.0.1      spatstat.utils_2.2-0  parallel_4.0.1        RColorBrewer_1.1-2    yaml_2.2.1            reticulate_1.22       pbapply_1.5-0         gridExtra_2.3         sass_0.4.0            rpart_4.1-15          stringi_1.7.6         highr_0.9             rlang_0.4.12          pkgconfig_2.0.3       matrixStats_0.61.0    evaluate_0.14         lattice_0.20-45       ROCR_1.0-11           purrr_0.3.4           tensor_1.5            labeling_0.4.2        patchwork_1.1.1       htmlwidgets_1.5.4     bit_4.0.4             cowplot_1.1.1         tidyselect_1.1.1      here_1.0.1            parallelly_1.29.0     RcppAnnoy_0.0.19      plyr_1.8.6            magrittr_2.0.1        R6_2.5.1              generics_0.1.1        DBI_1.1.1             pillar_1.6.4          withr_2.4.3           mgcv_1.8-38           fitdistrplus_1.1-6    survival_3.2-13       abind_1.4-5           tibble_3.1.6          future.apply_1.8.1    crayon_1.4.2          KernSmooth_2.23-20    utf8_1.2.2            spatstat.geom_2.3-0  
## [109] plotly_4.10.0         tzdb_0.2.0            rmarkdown_2.11        grid_4.0.1            data.table_1.14.2     digest_0.6.29         xtable_1.8-4          tidyr_1.1.4           httpuv_1.6.3          munsell_0.5.0         viridisLite_0.4.0     bslib_0.3.1